Translating Videos to Commands for Robotic Manipulation with Deep Recurrent Neural Networks

نویسندگان

Anh Nguyen

Dimitrios Kanoulas

Luca Muratore

Darwin G. Caldwell

Nikolaos G. Tsagarakis

چکیده

We present a new method to translate videos to commands for robotic manipulation using Deep Recurrent Neural Networks (RNN). Our framework first extracts deep features from the input video frames with a deep Convolutional Neural Networks (CNN). Two RNN layers with an encoderdecoder architecture are then used to encode the visual features and sequentially generate the output words as the command. We demonstrate that the translation accuracy can be improved by allowing a smooth transaction between two RNN layers and using the state-of-the-art feature extractor. The experimental results on our new challenging dataset show that our approach outperforms recent methods by a fair margin. Furthermore, we combine the proposed translation module with the vision and planning system to let a robot perform various manipulation tasks. Finally, we demonstrate the effectiveness of our framework on a full-size humanoid robot WALK-MAN.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech Emotion Recognition Using Scalogram Based Deep Structure

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...

متن کامل

Learning Lexical Entries for Robotic Commands using Crowdsourcing

Robotic commands in natural language usually contain lots of spatial descriptions which are semantically similar but syntactically different. Mapping such syntactic variants into semantic concepts that can be understood by robots is challenging due to the high flexibility of natural language expressions. To tackle this problem, we collect robotic commands for navigation and manipulation tasks u...

متن کامل

Autoencoder with recurrent neural networks for video forgery detection

Video forgery detection is becoming an important issue in recent years, because modern editing software provide powerful and easy-to-use tools to manipulate videos. In this paper we propose to perform detection by means of deep learning, with an architecture based on autoencoders and recurrent neural networks. A training phase on a few pristine frames allows the autoencoder to learn an intrinsi...

متن کامل

Classifying sport videos with deep neural networks

This project aims to apply deep neural networks to classify video clips in applications used to streamline advertisements on the web. The system focuses on sport clips but can be expanded into other advertisement fields with lower accuracy and longer training times as a consequence. The main task was to find the neural network model best suited for classifying videos. To achieve this the field ...

متن کامل

Tracking of Humans in Video Stream Using LSTM Recurrent Neural Network

In this master thesis, the problem of tracking humans in video streams by using Deep Learning is examined. We use spatially supervised recurrent convolutional neural networks for visual human tracking. In this method, the recurrent convolutional network uses both the history of locations and the visual features from the deep neural networks. This method is used for tracking, based on the detect...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1710.00290 شماره

صفحات -

تاریخ انتشار 2017

Translating Videos to Commands for Robotic Manipulation with Deep Recurrent Neural Networks

نویسندگان

چکیده

منابع مشابه

Speech Emotion Recognition Using Scalogram Based Deep Structure

Learning Lexical Entries for Robotic Commands using Crowdsourcing

Autoencoder with recurrent neural networks for video forgery detection

Classifying sport videos with deep neural networks

Tracking of Humans in Video Stream Using LSTM Recurrent Neural Network

عنوان ژورنال:

اشتراک گذاری